Lecture 19
\[ y(s_i)|\boldsymbol{y}_{-s_i} \sim N\left(X_{i\cdot}\beta + \phi\sum_{j=1}^n \frac{A_{ij}}{D_{ii}} ~ \big(y(s_j)-X_{j\cdot}\beta\big),~ \sigma^2 D^{-1}_{ii} \right) \]
\[ \boldsymbol{y} \sim N(\boldsymbol{X}\boldsymbol{\beta},~\sigma^2(\boldsymbol{D}-\phi \boldsymbol{A})^{-1}) \]
Let us consider what happens to our derivation of the SAR model when we include a \(\boldsymbol X\boldsymbol\beta\) in our formula model.
\[ \begin{aligned} \boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \phi \boldsymbol{D}^{-1} \, \boldsymbol{A} \boldsymbol{y} + \boldsymbol{\epsilon} \\ \boldsymbol{\epsilon} \sim N(\boldsymbol{0},\, \sigma^2 \boldsymbol{D}^{-1}) \end{aligned} \]
\[ \begin{aligned} \boldsymbol{y} &= \boldsymbol{X}\boldsymbol{\beta} + \phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A} \, \boldsymbol{y} + \boldsymbol{\epsilon} \\ \boldsymbol{y} - \phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A} \, \boldsymbol{y} &= \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \\ (I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A}) \, \boldsymbol{y} &= \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \end{aligned} \]
\[ \boldsymbol{y} = (I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1} \boldsymbol{X}\boldsymbol{\beta} + (I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1} \boldsymbol{\epsilon} \]
\[ \begin{aligned} E(\boldsymbol{y}) &= (I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1} \boldsymbol{X}\boldsymbol{\beta} \\ \end{aligned} \]
\[ \begin{aligned} Var(\boldsymbol{y}) &= \left((I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1}\right) \sigma^2 \boldsymbol{D}^{-1} \left((I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1}\right)^{t} \\ &= \sigma^2 \left((I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1}\right) \boldsymbol{D}^{-1} \left((I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1}\right)^{t} \\ \end{aligned} \]
This is the same behavior we saw with the AR(1) model, where the mean of the process is \(\delta / (1-\phi)\) and not \(\delta\).
This is a relatively minor issue but it does have practical implications when it comes to interpretation of the model parameters (particularly the slope coefficients \(\boldsymbol \beta\)).
The previous model is sometimes refered to as the SAR lag model, a variation of this model, common in spatial econometrics, is the SAR error model which is defiend as follows,
\[ \begin{aligned} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{u} \\ \boldsymbol{u} &= \phi \boldsymbol{D}^{-1} \, \boldsymbol{A} \boldsymbol{u} + \boldsymbol{\epsilon} \\ \boldsymbol{\epsilon} &\sim N(\boldsymbol{0},\, \sigma^2 \boldsymbol{D}^{-1}) \end{aligned} \]
As with the SAR lag model we can solve for \(\boldsymbol{u}\),
\[ \begin{aligned} \boldsymbol{u} &= \phi \boldsymbol{D}^{-1} \, \boldsymbol{A} \boldsymbol{u} + \boldsymbol{\epsilon} \\ \boldsymbol{u} &- \phi \boldsymbol{D}^{-1} \, \boldsymbol{A} \boldsymbol{u} = \boldsymbol{\epsilon} \\ \boldsymbol{u} &= (I - \phi \boldsymbol{D}^{-1} \, \boldsymbol{A} )^{-1} \boldsymbol{\epsilon} \\ \end{aligned} \]
\[ \begin{aligned} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + (I - \phi \boldsymbol{D}^{-1} \, \boldsymbol{A} )^{-1} \boldsymbol{\epsilon} \\ \end{aligned} \]
\[ \begin{aligned} E(\boldsymbol{y}) &= \boldsymbol{X}\boldsymbol{\beta} \\ \end{aligned} \]
\[ \begin{aligned} Var(\boldsymbol{y}) &= \sigma^2 \left((I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1}\right) \boldsymbol{D}^{-1} \left((I-\phi \, \boldsymbol{D}^{-1} \, \boldsymbol{A})^{-1}\right)^{t} \\ \end{aligned} \]
BIR74 vs SID74spdep + spatialregCharacteristics of weights list object:
Neighbour list object:
Number of regions: 100
Number of nonzero links: 490
Percentage nonzero weights: 4.9
Average number of links: 4.9
Weights style: M
Weights constants summary:
n nn S0 S1 S2
M 100 10000 490 980 10696
Moran I test under randomisation
data: 1000 * nc$SID74/nc$BIR74
weights: listW
Moran I statistic standard deviate = 3.6355,
p-value = 0.0001387
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation
0.210046454 -0.010101010
Variance
0.003666802
Geary C test under randomisation
data: nc$SID74
weights: listW
Geary C statistic standard deviate =
0.91949, p-value = 0.1789
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic Expectation
0.88988684 1.00000000
Variance
0.01434105
Geary C test under randomisation
data: 1000 * nc$SID74/nc$BIR74
weights: listW
Geary C statistic standard deviate = 3.0989,
p-value = 0.0009711
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic Expectation
0.67796679 1.00000000
Variance
0.01079878
nc_car = spatialreg::spautolm(
formula = 1000*SID74/BIR74 ~ 1, data = nc,
listw = listW, family = "CAR"
)
summary(nc_car)
Call: spatialreg::spautolm(formula = 1000 * SID74/BIR74 ~ 1, data = nc,
listw = listW, family = "CAR")
Residuals:
Min 1Q Median 3Q Max
-2.13872 -0.83535 -0.22355 0.55014 7.68640
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.00203 0.24272 8.2484 2.22e-16
Lambda: 0.13062 LR test value: 8.6251 p-value: 0.0033157
Numerical Hessian standard error of lambda: 0.030475
Log likelihood: -182.3989
ML residual variance (sigma squared): 2.1205, (sigma: 1.4562)
Number of observations: 100
Number of parameters estimated: 3
AIC: NA (not available for weighted model)
nc_sar_err = spatialreg::spautolm(
formula = 1000*SID74/BIR74 ~ 1, data = nc,
listw = listW, family = "SAR"
)
summary(nc_sar_err)
Call: spatialreg::spautolm(formula = 1000 * SID74/BIR74 ~ 1, data = nc,
listw = listW, family = "SAR")
Residuals:
Min 1Q Median 3Q Max
-2.09307 -0.87039 -0.20274 0.51156 7.62830
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.01084 0.23622 8.5127 < 2.2e-16
Lambda: 0.079934 LR test value: 8.8911 p-value: 0.0028657
Numerical Hessian standard error of lambda: 0.024597
Log likelihood: -182.2659
ML residual variance (sigma squared): 2.1622, (sigma: 1.4704)
Number of observations: 100
Number of parameters estimated: 3
AIC: NA (not available for weighted model)
nc_sar_lag = spatialreg::lagsarlm(
formula = 1000*SID74/BIR74 ~ 1, data = nc,
listw = listW
)
summary(nc_sar_lag)
Call:spatialreg::lagsarlm(formula = 1000 * SID74/BIR74 ~ 1, data = nc,
listw = listW)
Residuals:
Min 1Q Median 3Q Max
-2.03968 -1.10398 -0.14413 0.54949 7.70889
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.40320 0.27296 5.1406 2.738e-07
Rho: 0.063331, LR test value: 7.5512, p-value: 0.0059971
Asymptotic standard error: 0.021655
z-value: 2.9246, p-value: 0.0034493
Wald statistic: 8.5531, p-value: 0.0034493
Log likelihood: -182.9358 for lag model
ML residual variance (sigma squared): 2.2232, (sigma: 1.491)
Number of observations: 100
Number of parameters estimated: 3
AIC: 371.87, (AIC for lm: 377.42)
LM test for residual autocorrelation
test value: 1.3279, p-value: 0.24917
Moran I test under randomisation
data: residuals(nc_car)
weights: listW
Moran I statistic standard deviate = -1.7952, p-value = 0.07261
alternative hypothesis: two.sided
sample estimates:
Moran I statistic Expectation Variance
-0.117449312 -0.010101010 0.003575538
Moran I test under randomisation
data: residuals(nc_sar_err)
weights: listW
Moran I statistic standard deviate = 0.17958, p-value = 0.8575
alternative hypothesis: two.sided
sample estimates:
Moran I statistic Expectation Variance
0.0006769267 -0.0101010101 0.0036020941
Moran I test under randomisation
data: residuals(nc_sar_lag)
weights: listW
Moran I statistic standard deviate = 0.88877, p-value = 0.3741
alternative hypothesis: two.sided
sample estimates:
Moran I statistic Expectation Variance
0.043255233 -0.010101010 0.003604055
This is the transformation used by Cressie and Road in Spatial Data Analysis of Regional Counts (1989).
\[ FT = \sqrt{1000} \left( \sqrt{\frac{SID74}{BIR74}} + \sqrt{\frac{SID74+1}{BIR74}} \right) \]
Moran I test under randomisation
data: residuals(nc_car_sqrt)
weights: listW
Moran I statistic standard deviate = -3.1196, p-value = 0.9991
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
-0.200890550 -0.010101010 0.003740354
Moran I test under randomisation
data: residuals(nc_sar_err_sqrt)
weights: listW
Moran I statistic standard deviate = -0.422, p-value = 0.6635
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
-0.035976084 -0.010101010 0.003759585
Moran I test under randomisation
data: residuals(nc_sar_lag_sqrt)
weights: listW
Moran I statistic standard deviate = 4.7893, p-value = 8.368e-07
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.285100348 -0.010101010 0.003799187
Family: gaussian
Links: mu = identity; sigma = identity
Formula: 1000 * SID74/BIR74 ~ 1 + car(A, gr = NAME)
Data: nc (Number of observations: 100)
Draws: 4 chains, each with iter = 20000; warmup = 10000; thin = 10;
total post-warmup draws = 4000
Correlation Structures:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
car 0.75 0.16 0.36 0.98 1.00 912 593
sdcar 2.70 0.45 1.60 3.39 1.01 352 579
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.06 0.41 1.43 2.76 1.01 423 467
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.54 0.33 0.07 1.24 1.03 133 84
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
stan_car = rstan::stan_model(
model_code = "
data {
int<lower=1> n;
int<lower=1> p;
matrix[n, p] X;
vector[n] y;
matrix<lower=0, upper=1>[n, n] W;
matrix[n, n] D;
matrix[n, n] D_inv;
}
parameters {
vector[p] beta;
real<lower=0> tau;
real<lower=0, upper=1> alpha;
}
transformed parameters {
vector[n] y_cond = X * beta + alpha * D_inv * W * (y - X * beta);
real<lower=0> sigma2 = 1/tau;
}
model {
y ~ multi_normal_prec(X * beta, tau * (D - alpha * W));
beta ~ normal(0, 1);
tau ~ gamma(2, 2);
}
"
)# A tibble: 4 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 beta[1] 1.88 1.92 0.338 0.273 1.32 2.34 1.00 3872. 3961.
2 tau 0.115 0.114 0.0168 0.0168 0.0884 0.143 1.00 4032. 3692.
3 sigma2 8.92 8.80 1.34 1.29 6.98 11.3 1.00 4032. 3692.
4 alpha 0.723 0.744 0.150 0.150 0.439 0.932 1.00 3922. 3960.
\[ y \sim N(X\beta + w, \sigma^2) \\ w \sim N(0, \sigma^2_{CAR} (D - \alpha W)^{-1}) \]
\[ y \sim N(X\beta + w, \sigma^2 (D - \alpha W)^{-1}) \]
Family: gaussian
Links: mu = identity; sigma = identity
Formula: 1000 * SID74/BIR74 ~ 1 + sar(W, type = "error")
Data: nc (Number of observations: 100)
Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 2;
total post-warmup draws = 4000
Correlation Structures:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
errorsar 0.41 0.12 0.18 0.63 1.00 3381 3378
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.05 0.26 1.54 2.58 1.00 3659 3041
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.49 0.11 1.30 1.71 1.00 3814 3384
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
predict()If instead we use \(\boldsybol{X}\boldsymbol{\beta} + \phi \boldsymbol{W}\boldsymbol{y}\), we get the following:
p = b_sar_err |>
tidybayes::spread_draws(b_Intercept, errorsar) |>
filter(.chain == 1) |>
mutate(
y_cond = map2(
b_Intercept, errorsar,
~ .x + .y * W %*% (1000 * (nc$SID74 / nc$BIR74))
)
) |>
pull(y_cond) |>
do.call(cbind, args = _)
tibble(
y_cond_mean = apply(p, 1, mean),
y_cond_q025 = apply(p, 1, quantile, 0.025),
y_cond_q975 = apply(p, 1, quantile, 0.975)
) |>
ggplot(aes(x=1000 * (nc$SID74 / nc$BIR74), y=y_cond_mean)) +
geom_abline(intercept=0, slope=1, color="grey") +
geom_point() +
geom_errorbar(aes(ymin=y_cond_q025, ymax=y_cond_q975), alpha=0.25) +
coord_fixed() +
labs(x = "Observed", y = "Predicted") Family: gaussian
Links: mu = identity; sigma = identity
Formula: 1000 * SID74/BIR74 ~ 1 + sar(W, type = "lag")
Data: nc (Number of observations: 100)
Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 2;
total post-warmup draws = 4000
Correlation Structures:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
lagsar 0.39 0.12 0.14 0.61 1.00 2561 2779
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.27 0.29 0.74 1.84 1.00 2489 2675
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.49 0.11 1.29 1.73 1.00 3224 3138
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
predict()If instead we use \(\boldsybol{X}\boldsymbol{\beta} + \phi \boldsymbol{W}\boldsymbol{y}\), we get the following:
p = b_sar_lag |>
tidybayes::spread_draws(b_Intercept, lagsar) |>
filter(.chain == 1) |>
mutate(
y_cond = map2(
b_Intercept, lagsar,
~ .x + .y * W %*% (1000 * (nc$SID74 / nc$BIR74))
)
) |>
pull(y_cond) |>
do.call(cbind, args = _)
tibble(
y_cond_mean = apply(p, 1, mean),
y_cond_q025 = apply(p, 1, quantile, 0.025),
y_cond_q975 = apply(p, 1, quantile, 0.975)
) |>
ggplot(aes(x=1000 * (nc$SID74 / nc$BIR74), y=y_cond_mean)) +
geom_abline(intercept=0, slope=1, color="grey") +
geom_point() +
geom_errorbar(aes(ymin=y_cond_q025, ymax=y_cond_q975), alpha=0.25) +
coord_fixed() +
labs(x = "Observed", y = "Predicted")Sta 344/644 - Fall 2023